We use the Dawid-Skene vote aggregation algorithm to obtain the ground truth label for each snippet, since this is often considered ‘gold standard’ for aggregation in practice. DawidSkene is an unsupervised inference algorithm that gives the Maximum Likelihood Estimate of observer error rates using the EM algorithm.
1) Using the labels given by multiple annotators, estimate the most likely “correct” label for each video snippet.
2) Based on the estimated correct answer for each object, compute the error rates for each annotator.
3) Taking into consideration the error rates for each annotator, recompute the most likely “correct” label for each object.
4) Repeat steps 2 and 3 until one of the termination criteria is met (error rates are below a pre-specified threshold or a pre-specified number of iterations are completed).